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DETAILED ACTION 
Response to Arguments 

1 . Applicant's arguments filed September 8, 2005 have been fully considered but 
they are not persuasive. 

The Applicant presented one argument relating to the ability to compare time 
occurrences of two cues selected from the identified cues and the ability to determine 
the proximity of the occurrences of the two selected cues and classify the program 
based on the proximity of the occurrence of the two selected cues. The Applicant 
states, "Wei, Gang teaches a system wherein the duration of text (and face) information 
is used to determine the type of program". While Wei, Gang does teach that the 
duration of text is used to determine the type of program; he also teaches that a number 
(per unit time) of text occurrences are used to create a feature space (in other words, 
they are used to classify programs). The number per unit time of text occurrences in 
the program clearly reads on the proximity feature of two text cues (occurrences) and 
the ability to classify the program according to the proximity of the two text occurrences. 
Wei, Gang even states that text and faces often occur at the same time in news. In 
soaps and sitcoms text regions are rare, and usually there are many close-up face 
shots lasting for a long time in soaps while in sitcoms faces have smaller size and 
shorter duration. This realization clearly shows that Wei, Gang recognizes the need for 
tracking text (and face) occurrences and their proximity to one another for the purpose 
of classifying programs into feature spaces. The arguments relating to these features 
are therefore not persuasive. 
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The following art rejection, which is made FINAL, is mostly copied and pasted 
from the previous Office Action, with minor changes made to reflect current claim 
amendments. 



Claim Rejections - 35 USC § 102 

1 . The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

2. Claims 1-4 and 6-8 are rejected under 35 U.S.C. 102(e) as being anticipated by 
Wei, Gang et al ("TV program classification based on face and text processing"), 
previously cited by applicant. 

To serve as a brief overview, the Wei, Gang reference discloses a system for 
classification of TV programs based on face and text processing. In this instance, the 
face processing does not come into consideration; however, the text processing 
according to extracted transcript information comes into consideration when classifying 
input audio/video programs. Extracted text cues and domain-knowledge are used to aid 
in the classification process. 

Regarding claim 1 , the claimed "method for classification of a program" is met as 
follows: 
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• The claimed step of "receiving an audio/video signal corresponding to the 
program" is met by the statement that "consumers today are receiving 
increased numbers of channels" [page 1345, paragraph 2]. 

• The claimed step of "determining transcript information associated with the 
program using the audio/video signal" is met by the text tracking and 
extraction being performed on frames in order to extract and receive text 
from the video signal [page 1346, section 2.2]. 

• The claimed step of "identifying at least one cue in the transcript 
information and an associated time of occurrence, each of the cues being 
associated with a type of program" is met by the text that is a helpful cue 
in recognizing the type of a TV program [page 1345, paragraph 4]. Also, 
the trajectory being used in the text tracking method (section 2.2) notes a 
time of occurrence. Wei, Gang utilizes a "text tracking" to track text and to 
consider the text if it falls into an appropriate trajectory. In section 3.1 , the 
reference states, "the number and average duration of the ^survived' 
trajectories constitute additional dimensions in the feature space." Wei, 
Gang teaches that a number (per unit time) of text occurrences are used 
to create a feature space (in other words, they are used to classify 
programs). 

• The claimed step of "correlating the at least one cue identified in the 
transcript information to the type of program" is met by the inherent 
correlation between the text cues and the type of a TV programs 
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associated with the text cues. The reference states "text is a helpful cue 
in recognizing certain types of TV programs". This recognition of the TV 
program type inherently teaches a correlation between the text cue and 
the TV program type [page 1345, paragraph 4]. 

• The claimed step of "comparing the time of occurrence of two cues 
selected from the at least one identified cue and determining a proximity of 
occurrence of the two selected cues" is met by the same trajectory, which 
samples different text trackings and establishes, based on duration and 
number, if they should be utilized. The number per unit time of text 
occurrences in the program clearly reads on the proximity feature of two 
text cues (occurrences) and the ability to classify the program according to 
the proximity of the two text occurrences. Wei, Gang even states that text 
and faces often occur at the same time in news. In soaps and sitcoms 
text regions are rare, and usually there are many close-up face shots 
lasting for a long time in soaps while in sitcoms faces have smaller size 
and shorter duration. This realization clearly shows that Wei, Gang 
recognizes the need for tracking text (and face) occurrences and their 
proximity to one another for the purpose of classifying programs into 
feature spaces. 

• The claimed step of "classifying the program based on the proximity of 
occurrence of the two selected cues" is met by the classification of a TV 
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program into a category based on the extracted text cue [page 1345, 
paragraph 4], and their proximity to one another, as discussed above. 

Regarding claim 2, the claimed step of "receiving an audio/data/video signal 
which includes the transcript information" is met by page 1345, section 1, wherein the 
reference discloses the step of receiving an audio/video feed and extracting the 
transcript from the received audio/video feed. 

Regarding claim 3, the claimed feature "wherein if the proximity of occurrence is 
greater than a predetermined amount, the two selected cues are ignored in connection 
with determining the program classification" is met by the trajectory, which classifies the 
program based on the proximity of the text occurrences. As the reference states, text 
and faces often occur at the same time in news. In soaps and sitcoms text regions are 
rare, and usually there are many close-up face shots lasting for a long time in soaps 
while in sitcoms faces have smaller size and shorter duration. This realization clearly 
shows that Wei, Gang recognizes the need for tracking text (and face) occurrences and 
their proximity to one another for the purpose of classifying programs into feature 
spaces. The claimed feature "wherein if the proximity of occurrence is not greater than 
the predetermined amount, the two selected cues are utilized in connection with 
determining the classification" is also met by the trajectory, which uses the number and 
average duration to establish which text trackings are utilized for classification. 

Regarding claim 4, the claimed "classification of the program is one of a news 
program, talk show, sports program, panel discussions, interviews, and situational 
comedy" is met by the teaching of four categories, namely news, commercial, sitcom, 
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and soap (page 1345, paragraph 4). He also suggests that the system can be extended 
to recognize more categories by adding new classification rules. 

Regarding claim 6, the claimed "apparatus for classification of a program" is met 
as follows: 

• The claimed "receiver to receive an audio/data/video signal corresponding 
to the program" is met by the statement that "consumers today are 
receiving increased numbers of channels" [page 1345, paragraph 2]. 

• The claimed "transcript information extractor for extracting transcript 
information associated with the program from the audio/data/video input 
signal" is met by the text tracking and extraction being performed on 
frames in order to extract and receive text from the video signal [page 
1346, section 2.2]. 

• The claimed "cue extractor for identifying at least one cue of a plurality of 
cues in the transcript information and an associated time of occurrence, 
each of the plurality of cues having associated therewith a type of 
program" is met by the text that is a helpful cue in recognizing the type of 
a TV program [page 1345, paragraph 4]. Also, the trajectory being used in 
the text tracking method (section 2.2) notes a time of occurrence. Wei, 
Gang utilizes a "text tracking" to track text and to consider the text if it falls 
into an appropriate trajectory. In section 3.1 , the reference states, "the 
number and average duration of the 'survived' trajectories constitute 
additional dimensions in the feature space." Wei, Gang teaches that a 



Application/Control Number: 09/739,476 Page 8 

Art Unit: 2614 

number (per unit time) of text occurrences are used to create a feature 
space (in other words, they are used to classify programs). 

• The claimed "knowledge database for correlating the at least one cue of 
the plurality of cues identified in the transcript information to the type of 
program" is met by the inherent correlation between the text cues and the 
type of a TV programs associated with the text cues. The reference states 
"text is a helpful cue in recognizing certain types of TV programs". This 
recognition of the TV program type inherently teaches a correlation 
between the text cue and the TV program type [page 1345, paragraph 4]. 

• The claimed "temporal database for comparing the time of occurrence of 
two selected cues of the at least one cue to determine a proximity of 
occurrence of the two selected cues" is met by the same trajectory, which 
samples different text trackings and establishes, based on duration and 
number, if they should be utilized. The number per unit time of text 
occurrences in the program clearly reads on the proximity feature of two 
text cues (occurrences) and the ability to classify the program according to 
the proximity of the two text occurrences. Wei, Gang even states that text 
and faces often occur at the same time in news. In soaps and sitcoms 
text regions are rare, and usually there are many close-up face shots 
lasting for a long time in soaps while in sitcoms faces have smaller size 
and shorter duration. This realization clearly shows that Wei, Gang 
recognizes the need for tracking text (and face) occurrences and their 
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proximity to one another for the purpose of classifying programs into 
feature spaces. 

• The claimed "classifier for classifying the program based on the proximity 
of occurrence" is met by the classification of a TV program into a category 
based on the extracted text cue [page 1345, paragraph 4], and their 
proximity to one another, as discussed above. 
Regarding claim 7, see the above rejection to claim 3. 
Regarding claim 8, the claimed "classification of the program is one of a news 
program, talk show, sports program, panel discussions, interviews, and situational 
comedy" is met by the teaching of four categories, namely news, commercial, sitcom, 
and soap (page 1345, paragraph 4). He also suggests that the system can be extended 
to recognize more categories by adding new classification rules. 

Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 1 02 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 5 and 9 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Wei, Gang et al ("TV Program Classification Based on Face and Text Processing"), 
previously cited by applicant, in view of Wei, Qi et al ("Integrating visual, audio and text 
analysis for news video"), previously cited by examiner 
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Regarding claim 5, the Wei, Gang et al reference discloses all of that which is 
discussed above with regards to claim 1 . The Wei, Gang reference does not disclose, 
"Transcript information comprises closed-captioned text". The Wei, Qi reference 
discloses a similar situation to that of Wei, Gang, wherein the text information for 
classifying a news program comes in the form of close caption text [page 2, paragraph 
2]. It would have been obvious to one of ordinary skill in the art at the time of the 
invention to utilize close caption text as the transcript information for the program to be 
classified, in order to utilize an already existing technology (close captioning) that is 
easy to parse and easy to work with for the purposes of classification. 

Regarding claim 9, the Wei, Gang et al reference discloses all of that which is 
discussed above with regards to claim 6. The Wei, Gang reference does not disclose, 
"Transcript information comprises closed-captioned text". The Wei, Qi reference 
discloses a similar situation to that of Wei, Gang, wherein the text information for 
classifying a news program comes in the form of close caption text [page 2, paragraph 
2]. It would have been obvious to one of ordinary skill in the art at the time of the 
invention to utilize close caption text as the transcript information for the program to be 
classified, in order to utilize an already existing technology (close captioning) that is 
easy to parse and easy to work with for the purposes of classification. 

Conclusion 

5. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1.136(a). 
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A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Michael R. Shannon who can be reached at (571) 272- 
7356 or Michael.Shannon@uspto.gov. The examiner can normally be reached by 
phone Monday through Friday 8:00 AM - 5:00PM, with alternate Frida/s off. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, John Miller, can be reached at (571) 272-7353. 

Any response to this action sliould be mailed to: 

Please address mail to be delivered by the United States Postal Service (USPS) 
as follows: 

Mail Stop 

Commissioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 

Effective January 14, 2005, except correspondence for Maintenance Fee 

payments, Deposit Account Replenishments (see 1 .25(c)(4)), and Licensing and 
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Review (see 37 CFR 5.1(c) and 5.2(c)), please address correspondence to be delivered 
by other delivery services (Federal Express (Fed Ex), UPS, DHL, Laser, Action, 
Purolater, etc.) as follows: 



United States Patent and Trademark Office 
Customer Service Window 
Randolph Building 
401 Dulany Street 
Alexandria, VA22314 

Some correspondence may be submitted electronically. See the Office's Internet 
Web site http://v\^v\AA/. uspto.gov for additional information. 

Or faxed to: (571 ) 273-8300 



Hand-delivered responses should be brought to: 

Randolph Building 
401 Dulany Street 
Alexandria, VA 2231 4 

Any inquiry of a general nature or relating to the status of this application or 

proceeding should be directed to customer service whose telephone number is (571) 



272-2600. 



Michael R Shannon 

Examiner 

Art Unit 2614 



Michael R Shannon 
November 18, 2005 



^ JOHN MILLER 
SUPERVISORY PATENT EXAMINER 
TECHNOLOGY CENTER 2600 




